Feature #4: Query Peak Users
Implementing the "Query Peak Users" feature for our "Cellular Operator AT&T" project.
Description#
The cellular operator serves a rectangular region, represented by a 2D grid with one base station in each cell. Each base station is limited in the number of simultaneous users. During busy hours, some users get poor or no service. The company leadership wants to fix this problem by deploying multiple base stations, wherever necessary. Since this is a capital-intensive plan, we want to systematically deploy new base stations only where they are bound to help the most.
The operator has gathered the peak number of users connected to each base station. The executives want to query the obtained data and calculate the peak number of users that are connected to the base stations in a rectangular subset of the covered area. The executives may need to query the data multiple times. Therefore, we need to figure out an efficient way to execute the queries and return their results.
Note: The area to be queried is identified by the top-left and bottom-right cell coordinates.
The following illustration will help clarify to this problem:
1 of 2
2 of 2
Solution#
A beginner’s way of solving this problem is to use nested loops to iterate over the rectangular region, represented by the top-left and bottom-right coordinates, and sum up all the elements in that region.
Since the executives may want to query the data multiple times, it would make sense for us to perform some pre-processing.
Caching#
We could use extra space for speeding up the algorithm by pre-calculating all the possible rectangular regions and storing them in a hash map. This way, all the queries will take constant time. However, the space complexity will blow up, because we have to store all the possible regions.
Let’s see how we can build on this intuition of caching the results.
Caching rows#
Imagine pre-computing the cumulative sum of the elements for each row in the grid. We can derive a formula that will calculate the cumulative sum for each individual row , ranging from (inclusive):
Here, ranges from to the total number of columns, in the input grid users. We will calculate the prefix sum of all the cells in a row and store them at the appropriate cell in a 2D grid cache of order . We can simplify the notation given above and make use of the sum calculated in the last iteration:
Note: We can observe from our formulation that the
cachegrid will have an additional column, which will contain only zeros.
Let’s review the following illustration to understand this formulation better:
1 of 11
2 of 11
3 of 11
4 of 11
5 of 11
6 of 11
7 of 11
8 of 11
9 of 11
10 of 11
11 of 11
Once we have cached the prefix sums, we can accumulate the sum in the region row by row. The subset region is identified by the starting and ending cell coordinates, and . We can derive a formula for calculating the sum as well:
As the contains the prefix sum of the elements up to that column, that is, (exclusive), we will need to subtract the cells that are not part of the subset region, which is, .
Let’s review an example of this:
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Smart caching#
Similar to our approach from above, where we pre-calculated the prefix sum for a row, we can pre-compute a cumulative region sum with respect to the origin at .
For a cell that is represented by its coordinates , we will compute the cumulative region sum from the origin at and cache that using the following formula:
We can use the formulation given above for pre-calculating the cumulative sum for all the regions. However, we can do this more efficiently by making use of the cumulative sum of the regions that has already been calculated:
Here and range from to the , respectively, and the output cache is of order
Note: For the sake of simplicity and making the calculations easier to understand, we will set all the elements in the zeroth row and the column of the
cacheto zero.
Let’s review an illustration that represents the cumulative sum of the region:
1 of 11
2 of 11
3 of 11
4 of 11
5 of 11
6 of 11
7 of 11
8 of 11
9 of 11
10 of 11
11 of 11
The cumulative sum of the region is calculated with respect to the origin. Therefore, we have to ensure that the sub-regions that are not part of the queried region are subtracted. We can extract the sum for the region, which is identified by the coordinates and , from the cache with the following formula:
Note: We used the principle of inclusion-exclusion here.
We will add the cumulative sum of the region , because this region is covered twice by both and .
Let’s review an example of this:
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Complexity measures#
| Method | Time complexity | Space complexity |
|---|---|---|
brute_force |
||
cache_rows |
||
cache_smart |
Here and will represent the number of rows and columns, respectively.
Time complexity#
Let’s look at the time complexities for the methods, one by one:
brute_force: time per query.cache_rows: time per query, time pre-computation- The pre-computation in the constructor will take time.
- The
cache_rowsquery will take time.
cache_smart: time per query, time pre-computation- The pre-computation in the constructor will take time.
- Each
cache_smartquery will take time.
Space complexity#
Similarly, for the space complexity:
brute_force: Since the algorithm only uses a constant amount of additional space, the space complexity will be .cache_rows: The algorithm will use space to store the cumulative sum of all the rows.cache_smart: The algorithm will use space to store the cumulative sum of the region.
Feature #3: Power Up the Station
Feature #5: Densest Deployment